





- "Unexpected" events requiring attention
  - Different ISAs use the terms differently.
- Exceptions (sometimes called Traps)
  - Arises within the CPU
    - e.g., undefined opcode, overflow, syscall, divide by zero, ...
- Interrupt
  - Comes from an external I/O controller.
- Dealing with them without sacrificing performance is impossible.



## An Alternate Mechanism

- Vectored Interrupts
  - Handler address determined by the cause. This is very common in embedded processors.
- Example:

| Undefined opcode: | C000 0000 |
|-------------------|-----------|
|-------------------|-----------|

- Overflow: C000 0020
- ...: C000 0040
- Instructions either:
  - Deal with the interrupt.
  - Jump to the real handler.
  - Pass control to the OS.











## **Instruction-Level Parallelism**

- Pipelining: executing multiple instructions in parallel
- To increase ILP you need:
  - Deeper pipeline
    - Less work per stage  $\Rightarrow$  shorter clock cycle.
  - Multiple issue
    - Replicate pipeline stages  $\Rightarrow$  multiple pipelines.
    - Start multiple instructions per clock cycle.
    - But dependencies reduce this considerably in practice.







# **Loop Unrolling**

- Loop Unrolling is a loop transformation technique that attempts to optimize a program's execution speed at the expense of its binary size (space-time tradeoff). The transformation can be undertaken manually by the programmer or by an optimizing compiler.
- The goal of loop unrolling is to increase a program's speed by reducing (or eliminating) instructions that control the loop, such as pointer arithmetic and "end of loop" tests on each iteration; reducing branch penalties; as well as "hiding latencies, in particular, the delay in reading data from memory". To eliminate this overhead, loops can be re-written as a repeated sequence of similar independent statements.







## **Register Renaming**

- Reservation stations and the *reorder buffer* effectively provide *register renaming*.
- On instruction issue to reservation station
  - If operand is available in register file or reorder buffer
    - Copied to reservation station.
    - No longer required in the register; can be overwritten.
  - If operand is not yet available
    - It will be provided to the reservation station by a functional unit.
    - Register update may not be required.





- **Speculation** is used to allow execution of future instructions that (may) depend on the speculated instruction:
  - Speculate on the outcome of a conditional branch (branch prediction).
  - Speculate that a store (for which we don't yet know the address) that precedes a load, does not refer to the same address, allowing the load to be scheduled before the store (*load speculation*).
- Must have (hardware and/or software) mechanisms for:
  - Checking to see if the guess was correct.
  - Recovering from the effects of the instructions that were executed speculatively if the guess was incorrect.
- Ignore and/or buffer exceptions created by speculatively executed instructions until it is clear that they should really occur.



#### **Dependencies Review**

- When more than one instruction references a particular location for an operand, either reading it (as an input) or writing it (as an output), executing those instructions in an order different from the original program order can lead to three kinds of *data hazards*:
  - Read-after-write (RAW): A read from a register or memory location must return the value placed there by the last write in program order, not some other write. This is referred to as a **true dependency** or **flow dependency**, and requires the instructions to execute in program order.
  - Write-after-write (WAW): Successive writes to a particular register or memory location must leave that location containing the result of the second write. This can be resolved by squashing (synonyms: cancelling, annulling, mooting) the first write if necessary. WAW dependencies are also known as output dependencies.
  - Write-after-read (WAR): A read from a register or memory location must return the last prior value written to that location, and not one written programmatically after the read. This is the sort of **false dependency** that can be resolved by renaming. WAR dependencies are also known as **anti-dependencies**.





 We also must deal with *antidependencies* – when a later instruction (that executes earlier) produces a data value that destroys a data value used as a source in an earlier instruction (that executes later).

> R3 := R3 \* R5 R4 := R3 + 1 R3 := R5 + 1

Antidependency True data dependency Output dependency

- The constraint is similar to that of true data dependencies, except reversed:
  - Instead of the later instruction using a value (not yet) produced by an earlier instruction (*read before write*), the later instruction produces a value that destroys a value that the earlier instruction (has not yet) used (*write before read*).



#### **Fallacies**

- Pipelining is easy:
  - The basic idea is easy.
  - The devil is in the details, e.g., detecting data hazards.
- Pipelining is independent of technology:
  - So why haven't we always done pipelining?
  - More transistors make more advanced techniques feasible.
  - Pipeline-related ISA design needs to take account of technology trends.

#### **Concluding Remarks**

- ISA influences design of datapath and controller
  - Poor ISA design can make pipelining harder.
- Datapath and control influence design of ISA.
- Pipelining improves instruction throughput using parallelism:
  - More instructions completed per second.
  - Latency for each instruction is not reduced.
- Hazards: structural, data, control.
- Multiple issue and dynamic scheduling (ILP)
  - Dependencies limit achievable parallelism.
  - Complexity leads to the power wall.